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ABSTRACT 

Recent years have witnessed the explosion of online social 
networks (OSNs). They provide powerful IT- innovations for 
online social activities such as organizing contacts, publish- 
ing contents, and sharing interests between friends who may 
never meet before. As more and more people become the 
active users of online social networks, one may ponder ques- 
tions such as: (1) Do OSNs indeed improve our sociability? 
(2) To what extent can we expand our offline social spectrum 
in OSNs? (3) Can we identify some interesting user behav- 
iors in OSNs? Our work in this paper just aims to answer 
these interesting questions. To this end, we pay a revisit to 
the well-known Dunbar's number in online social networks. 
Our main research contributions are as follows. First, to 
our best knowledge, our work is the first one that systemat- 
ically validates the existence of the online Dunbar's number 
in the range of [200,300]. To reach this, we combine using 
local-structure analysis and user-interaction analysis for ex- 
tensive real-world OSNs. Second, we divide OSNs users into 
two categories: rational and aggressive, and find that ratio- 
nal users intend to develop close and reciprocated relation- 
ships, whereas aggressive users have no consistent behaviors. 
Third, we build a simple model to capture the constraints of 
time and cognition that affect the evolution of online social 
networks. Finally, we show the potential use of our findings 
in viral marketing and privacy management in online social 
networks. 
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1. INTRODUCTION 

In this day and age, the online social sites, like Face- 
boolfl, Livejourna{3, MySpac«Qand etc., provide people with 
a new powerful means to communicate and interact with 
each other. Through these sites, users can share blogs, pho- 
tos and current statuses. They can consolidate friendships 
in the real-world by exchanging information online and es- 
tablish new virtual friendships with others in the same site. 
It is these sites that lead to the formation of a new kind 
of social network, which is called the online social network. 
Indeed, with the thorough development of online social sites 
in the recent decade, the online social network has become 
an essential part of our daily life and is changing our social 
behaviors potentially. At the same time, different from the 
traditional real- world social network, the electric communi- 
cation data of the online social network is relatively easy 
to collect [5]. Besides, compared with the real- world social 
network, its scale is huge. So it is reasonable to conjecture 
that this new form of network would give many inspirations 
to the previous recognition of the social networks. 

Through coupling the number of friendships and the size 
of the neocortex in primates, Dunbar found humankind can 
only maintain as many as 150 friendships effectively [B . And 
the number 150 is then called the magic number in social 
networks. In our experience of using online social networks, 
we can easily find that some users have extremely large num- 
ber of friends, much more than 150, while others keep only 
an averaged level of friends. We believe it is the online mech- 
anisms that facilitate the formation of high-degree nodes, 
since friends making is so convenient that only requires an 
invitation to be added as a friend and an acceptance. There- 
fore we may cast doubts on whether online social networks 
deviate from the constraint of Dunbar's number. To ver- 
ify the doubts, it is necessary to investigate the following 
questions: 

• Does there still exist a magic number in online social 
networks as Dunbar's number in the real-world social 
networks? 

1 http://www. facebook.com 
2 http://www. livejournal.com 
3 http://www.myspace.com 



• If it exists, what's its value and how it generates? 

• How it changes things? 

In this paper, we aim to answer the above questions through 
the analysis on several datasets coming from some online so- 
cial sites. By observing many local measures, we conclude 
that there exists a new magic number pervasively, which is 
in the range of 200 and 300, greater than 150 found previ- 
ously in real-world social networks. We also validate this by 
investigating the traces of interaction between many users in 
online social sites. We find through our observations that al- 
though the online social sites provide us many easier ways to 
maintain online friendships effectively, there is still an upper 
limit on the number of substantial and meaningful friends. 
Furthermore, we believe that users can be distinguished by 
the magic number and they exhibit different behaviors and 
attitudes, respectively. 

Given the fact that many current models cannot interpret 
these phenomena, we present a new simple model to inter- 
pret how the magic number in online social networks gen- 
erates. Finally, we think this number is insightful to guide 
the viral marketing strategy and user privacy management. 
For instance, hub nodes may not be effective choice in vi- 
ral marketing, and certain users call for a detailed privacy 
setting mechanism. 

The rest of the paper is organized as follows. In Section^ 
some related works will be introduced. In Section[3] we will 
define some local and global measures used in the following 
analysis. Our observations and findings will be depicted in 
Section [4] In Section [5] we present a new model to interpret 
the new upper limit existing pervasively in online social net- 
works. We also give a talk about the business insights about 
the new magic number in Section [U Finally, we conclude 
this paper in Section [Jj 

2. RELATED WORK 

Our study is related to the work in three areas: the phe- 
nomenon of Dunbar's number, measurement analysis of on- 
line social networks and social network modeling. 

2.1 The phenomenon of Dunbar's number 

By investigating the relationship between neocortex size 
and group size in primates, Dunbar 6^ predicted the number 
of group size in human beings was 150, which was notable as 
Dunbar's number. According to him, human beings can only 
maintain a small fraction of relationships within the circle of 
Dunbar's number, and other relationships beyond that circle 
are not reciprocated or personalized. Dunbar's number tar- 
gets on real-world social networks when first put forward. 
However, recent works in online social networks have dis- 
played similar interesting observations [HE]- Roberts et al. 
pointed out that time spent using social media, including 
online social sites, was not associated with larger offline net- 
works 20 . Potential time and cognitive constraints were 
also considered in their work. Other work of online social 
networks related to Dunbar's number will be further dis- 
cussed in Section |2~21 

2.2 Measurement analysis of online social net- 
works 

Recently, researchers have done intensive study in online 
social networks. They measured the property of online so- 
cial networks from different perspectives. Phenomena such 



as small world, power-law, high clustering, assortativity have 
been observed in different social sites, which are believed to 
be the common properties of online social networks. Ahn 
et al. studied the largest online social networks Cyworld 
in South Korea[T]. They experimented on the whole data 
of Cyworld and discovered some unique characters of this 
site. They found an interesting phenomenon that most user 
connections were not active and attributed it to Dunbar's 
number. Mislove et al. used data from Flickr, YouTube, 
LiveJounal and Orkut, conducting measurement analysis in 
a large scale [17] . They incorporated various complex net- 
work measurements such as degree distribution, clustering 
coefficients, degree correlations, connected core etc. in the 
research. Colder et al. analyzed Facebook users in North 
American colleges or universities [7]. Their results on degree 
distribution showed that the number of people who have few 
hundreds of friends remained stable, but it started to drop 
sharply once the friends number exceeded 250, which also 
coincided with Dunbar's number. 

2.3 Social networks modeling 

Although Small- World[24] and BA[2] network models lay 
a foundation for the complex network, these two models can- 
not explain all the phenomena of different types of real net- 
works. As for social networks, multiple models have been 
proposed to fit their particular properties. Holme and Kim 
added a "triad formation step" beyond BA model(HK), i.e. 
establishing edges between neighbors of a nodepj]. David- 
sen et al. imported a similar process in their DEB model[4]. 
This process in fact corresponds to the real-world social 
network situation, as people are easily introduced to meet 
friends of their friends and build up connections. Both net- 
works generated from HK and DEB have power-law and high 
clustering properties. Jin et al. carefully studied the princi- 
ples of social network formation, and proposed a social net- 
work model (JGN) [13] • JGN also considered the influence 
of mutual friends, cost of friend maintenances and the maxi- 
mum connections. JGN and HK are the two early models to 
generate social networks, and they both grasp the core prin- 
ciple to generate networks-"transitivity". Thus many suc- 
cessive models inherit the idea of "transitivity" from them. 
The models mentioned above mainly focus on the real- world 
social networks, while online factors are not taken into ac- 
count. Although online social networks have gained pop- 
ularity in recent research, the work to model this kind of 
network is still quite insufficient. Yuta et al. pointed out 
that the cost of online friend maintenance was much lower, 
and they extended CCN[22] by adding a process of random 
linkage to form a new model CCNR 25 . Bonato et al. also 
adopted transitivity in their model ILT[3]. 

In summary, almost all the social network models, no mat- 
ter online or offline, adopt the rule of transitivity in various 
forms. And the networks generated by these models com- 
monly have the feature of high clustering due to this fact. 

3. PRELIMINARIES 

In this section, we depict definitions of some critical global 
and local measures which we would use in the following sec- 
tions. 

An online social network can be intuitively modeled as a 
graph G(V, E), where V is the set of users and E is the set of 
ties. For the reason that establishing a new tie usually needs 
mutual permission in online social sites, G is undirected. 



Generally, the number of a node's friendships can be defined 
as its degree. The averaged degree of the network can be 
defined as 
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kmax is the maximum degree among all the nodes and k m in 
is the minimum degree. p(k) is the degree distribution of 
the graph and for online social networks, it is always power- 
law. Usually, the complementary cumulative distribution 
function (CCDF) is used to characterize this. 

Clustering coefficient of a node is used to characterize how 
closely its neighbors are connected. It can be defined as 
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where Ei is the set of ties between i's neighbors and ki is 
the degree of i. For the case of ki = 1, we set d = in this 
paper. Then the averaged clustering coefficient of the nodes 
with degree k can be defined as 
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The averaged clustering coefficient of the network can be 
defined as 



C = 
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The averaged clustering coefficient of the social network is 
always higher than the technical network. 

The averaged degree of a node's neighbors, denoted as 
fcnn, is always used to depict the assortativity of the net- 
work. If the network is disassortative, the nodes with low 
degree is preferentially connected to ones with high degrees, 
then k nn will decrease with the increment of the degree. 
Contrarily, the nodes will be connected to those with similar 
degrees when the network is assortative. The social network 
is usually thought to be assortative. Here we define i's k nn 
as 
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Similarly, the averaged k„ n of the nodes with degree k can 
be defined as 
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It can be divided by kmax to be normalized. 

K-shell (k-core) index, denoted as k 3 , is usually used to 
characterize how far is a node away from the core of the 
network. For instance, greater value of k s means the node 
is closer to the core. It can be obtained through the fol- 
lowing method 14 . First, remove all the nodes with degree 
k = 1. After this stage of pruning, there may appear new 
nodes with k = 1. Then keep on pruning these nodes, as 
well, until all nodes with degree k = 1 are removed. k s of 
the removed nodes will be set to 1. Next, we repeat the 
pruning process in a similar way for the nodes with degree 
k — 2 and subsequently for higher values of k until all nodes 
are removed. In [T3j, it is found that in many networks, in- 
cluding online social networks, high-degree nodes may have 
low k s , indicating that those nodes were at the periphery of 
the network. The averaged k-shell index of the nodes with 



degree k can be defined as 
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where k\ is the k-shell index of i. 

The strength of a tie between two nodes in a social net- 
work is usually defined as the overlap of their friends [8lll2j. 
It means the more common friends they share, the more 
familiar they would be. In online social networks, sharing 
more common friends usually means they are geographically 
close to each other, or share the same profiles, or interact 
more frequently online. Online friends with a big value of 
tie strength may have a higher probability to be friends in 
offline social networks. We define the strength of tie between 
i and j as 

(8) 
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where is the number of common friends between node 
i and j, ki and kj is the degree of i and j, respectively. 
Based on the definition of tie strength, we can also define 
the strength of a node as the averaged strength of all the 
ties connected to it. It is 
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Then we can define the averaged strength of the nodes with 
degree k as 



w(k) = 
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4. A NEW MAGIC NUMBER 

In this section, we start from some observations on the lo- 
cal measures of a sampled online social network from Face- 
book. By coupling the variation of local measures with 
the increment of the degree, we discover an interesting phe- 
nomenon. Then we find this phenomenon pervasively exists 
in other online social networks. We summarize these ob- 
servations to formulate our conjectures. In the end of this 
section, we also validate them by investigating the real trace 
of online interactions between users. 

4.1 The turning point on local measures 

The sample dataset we use comes from [23] and it is 
publicly available. This dataset is a snapshot of Facebook 
network in the city of New Orleans, so we denote it as 
NewOrleans. It contains 63292 nodes and 816886 ties. Its 
k max = 1098, k min = 1, (k) = 25.8 and C = 0.22. 

We firstly analyze the measurement of degree distribution 
for the NewOrleans network. As shown in Figure [TJ we sur- 
prisingly find a gentle slope in the interval between [0,200] 
in the degree distribution. Unlike a straight line in typical 
power-law, an turning point obviously appears and power- 
law only exists in the tail. 

We calculate the fraction of users in the gentle slope inter- 
val and find more than 94% users are in it. Why most users 
are so "crowded" in this narrow interval while the number of 
users begin to drop dramatically beyond the interval? 

In fact, similar degree distribution has already been ob- 
served by Colder et al. in [7]. The turning point in their 
dataset approaches 250, and they argue that it is because 



Figure 1: Degree distribution of NewOrleans 



friendship in Facebook cannot completely represent conven- 
tional friendship. Nevertheless, they did not further explore 
the problem to give a more detailed explanation. 

To sum up, node degree k £ [200, 300] denote a threshold 
value, and distributions are different in the two sides of it. 
By revealing the threshold, we may want to know whether 
there are any hidden facts lay behind it. To recall Dunbar's 
number mentioned in Section 12.11 we can see the threshold 
is not far from Dunbar's number 150. So does Dunbar's 
number play a vital role in this phenomenon? Does it exist 
or shift to a new magic number in online social networks? To 
provide more concrete evidences to explain the phenomenon, 
we observe how C(k), k nn (k) and w(k) related with k in 
this network to see whether turning point also appears in 
these measurements, and we conclude our observations and 
remarks as follows. 

Observation 1. We plot the distribution of clustering 
coefficient, as shown in Figure [2(a) | C(k) steadily decreases 
with the increment of k at first, however, when k exceeds 
a certain threshold in the region of [200,300], the speed of 
decrement apparently raises. Therefore the turning point 
also exists in C(fc) with almost the same value in the degree 
distribution. Here we simply denote it as kr and kr £ 
[200, 300]. 

Remark 1. Clustering coefficient reflects the connec- 
tions among neighbors of a node. High clustering coef- 
ficient indicates tightly connected neighborhood. In view 
of Observation 1, we can conclude that users with lower 
degrees(fc < kr) have a well connected neighborhood, i.e., 
quite a large fraction of these users' friends are acquainted 
with each other. It's not hard to explain this in the real so- 
cial networks, as a person always get acquainted with some 
strangers through one of his friends. The behavior of meet- 
ing friends' friends is even strengthened in online social net- 
works since it enables users to meet others with no restric- 
tion in time and space. For instance, Facebook users will 
receive notifications in their News Feed when their friends 
establish friendship with another user, and they can click 
"add as friend" to become friends as well. Moreover, almost 
all the social sites provide the feature of Common Friends, 
which lists the other users having common friends with you. 
It is this form of friends making mechanisms that leads to 
a denser network for lower-degree users. On the contrary, 
things cannot hold true for the users with high degrees. The 
averaged clustering coefficient drops to a very low point, 
meaning that although these users have hundreds of friends, 
they do not know them well indeed. In consequence, we 



believe that most friends of these users seldom make new 
friends by way of them, making them have loose neighbor- 
hoods. It seems that users are divided into two types by 
the turning point kr, with one type of users positioned in 
an acquaintance network and maintaining some meaning- 
ful connections, and the other type of users keeping some 
formalized relationships. 

Observation 2. Figure [2(b)| displays the distribution of 
degree correlations, the trend of which can represent the 
assortativity of the network. It's interesting to find there is 
a positive trend within kr, and then fe„„(fe) scatters with a 
slight negative trend beyond that. The same trend of k nn (k) 
has also been observed in Mixi, an online social site in Japan 
[25]. 

Remark 2. The positive trend of k nn (k) at the first stage 
is consistent with the assortativity property of the offline so- 
cial networks |19| . In this stage, nodes tend to connect to 
those with similar degrees. However, the negative scattering 
in the range [200, 300] shows that the network transforms to 
a disassortative network. That is to say, for the users with 
degrees higher than kr , they are not preferentially connected 
to the nodes with similar degrees. Results in Observation 
2 suggest that users on the two sides of the turning point be- 
have differently in establishing friendships. On the left side, 
users are prone to connect other users with similar number 
of friends, while users on the right side may randomly add 
large amount of friends without much consideration. The 
results from degree correlations provide another evidence to 
prove the distinction of users in online social networks. 

Observation 3. It is easy to find that there exists an 
turning point in Figure 2(c) still within the value fcr- Be- 
fore reaching the turning point, w(k) steadily remains a high 
value with slight increment as k grows, but it begins to de- 
crease when k is higher than kr- 

Remark 3. The previous work [12] has shown that in 
social networks, like mobile communication network, the 
tie strength Wij between node i and j is strongly related 
with the frequency of interaction between users. It is also 
found that in online social networks, nodes with more mu- 
tual friends tend to trust each other and share more simi- 
lar interests [17]. So the averaged tie strength Wi of user i 
can imply the overall quality of relationships with friends to 
some extent. View from Observation 3, users with degrees 
lower than kr keep a high tie strength value, suggesting that 
these users maintain friendships with high quality. In con- 
trast, for users with degrees higher than kx, their strength 
is weak and become even weaker with the increment of k. 
Then we infer that among their friendships, some are frag- 
ile and not trusted. In the online social sites, this situation 
possibly corresponds to the following scenarios: 

• High-degree users are really popular to attract a lot of 
low-degree nodes to add them as friends. For example, 
they can be a movie star, a notable scientist or even a 
famous enterprise, etc. However, they do not commu- 
nicate with those "new" friends frequently. Therefore 
the ties between them become weak and some may 
even vanish eventually. 

• It's human nature to pursue for prestige in the soci- 
ety, while the forms may be different. Some people are 
eager to gain popularity by randomly sending thou- 
sands of invitations to be added as friends in online 
social networks. They can probably acquire thousands 
of online friends as time goes by, however, most of 





(a) C{k) (b) knn{k) (c) w{k) 

Figure 2: The variations of local measures for NewOrleans. 



the relationships are of no meaning and the number of 
shared friends is certainly quite small. 

We surprisingly find that the same turning point appears 
in all the measurements, coinciding with what we've found 
in the degree distribution. Based on the above observations 
and remarks, we can conjecture that in the dataset we used, 
there does exist a threshold kr € [200,300]. As a mat- 
ter of fact, this magic threshold distinguish users by their 
variations of the local topological properties in online social 
networks. Furthermore, we have found that users' online 
behaviors can be characterized by these properties. 

As Dunbar's number implies, an individual could not main- 
tain more than 150 friendships in the real world effectively 
because of the time and cognitive constraints. We unveil 
in the previous observations that users beyond the turning 
point behave quite differently. They keep loose neighbor- 
hood with only a few friends knowing each other, and they 
randomly connect to users of different degrees or demograph- 
ics with no preference; more importantly, the weak averaged 
tie strength of these users indicates poor relationships with 
their friends. It seems that the turning point kr plays a sim- 
ilar role in online social networks as Dunbar's number in the 
real society due to these facts. Although it is at low cost to 
make new friends and maintain friendships in online social 
networks, the number of friendships one can handle is still 
limited. In view of this, we infer that the turning point kr 
is just the upper limit of well maintained online friendships, 
and users can be divided into two categories based on this 
point. 

Online social networks has gained so much popularity in 
the worldwide. However, people's attitude towards it grad- 
ually changes with more intensive use. In this day and age, 
logging in your Facebook account is not purely for enter- 
tainment but transforms into a habit and everyday life, just 
like checking out your emails or browsing a web page. Many 
users become rational in using social sites, as they carefully 
maintain a well connected neighborhood, most of which are 
"cloned" from their offline social networks. So the motives 
of using online social networks for the so called "rational 
users" become as simple as maintaining friendships. They 
stick to this new form of friendship maintenance because it 
shortens the distance between friends, as they can be in- 
formed of what is happening to their friends through the 
news feed with no geographic constraints. In fact, "ratio- 
nal users" corresponds to the users within kr- Nonetheless, 
users beyond the threshold denote another type of users. We 
define them as "aggressive users" as they aggressively accu- 
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18158 
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Live journal 


5203764 


48709773 


L5017 


18.72 


0.27 



mulate large amount of friends while most relationships are 
inactive and lack of interactions. 

However, we draw our conclusions only from one dataset of 
Facebook so far. Does the magic threshold exist in other on- 
line social network samples? Or if it exists, do users behave 
distinctively on the two sides of the threshold? In the next 
section, we employ more datasets, either larger or smaller, 
to further discuss these issues. 

4.2 Pervasiveness 

In this section, in order to prove the ubiquity of the phe- 
nomena we found above, we import another five datasets of 
online social networks. The first four datasets are provided 
by |21] and are all publicly available. The four datasets are 
the complete Facebook networks whose ties are within four 
American universities. The four universities are Georgetown 
University (Georgetown), Princeton University (Princeton), 
University of Oklahoma(Oklahoma) and University of North 
Carolina at Chapel Hill(UNC), respectively. The fifth data set 
was collected from Livejournal, denoted as Live journal. It 
is also public to the research community [IT]. The detailed 
descriptions of these datasets are listed in Table [T] 

Next, we perform the same measurements on these five 
datasets as on NewOrleans. Just as shown in Figure [3] for 
all the measures, including CCDF, C(k), k nn (k) and w(k), 
there still exists a threshold kr 6 [200, 300], which is inde- 
pendent to \V\. Especially, in the dataset of Livejournal, 
the size of the network is as many as five millions, however, 
the kr is still in the range between 200 and 300. 

In addition, as users' demographic data is provided in 
the datasets of Georgetown, Oklahoma, Princeton and UNC, 
we can conduct experiment to examine the homophilv[16| 
property in the network. In these datasets, we investigate 
the homophily in the following contexts, including student 
or faculty flag, gender, major, second major, dorm, year 
of enrolling and high school. The attribute vector for a 
node can be defined as Hi(a\, a|, a%, a\, a|, a 6 , a\, a\), where 
a\{l = 1,2, ...,7) corresponds to the above properties se- 
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Figure 3: Results from other datasets. 



(f) Live journal 



quentially. we define binary distance for each attribute 



1 when a\ 7^ af, otherwise, 



which means, || a] — a\ 
I a\ — aj ||= 0. Then the homophily distance between node 
i and j can be defined simply as 
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The averaged homophily distance for node i can be defined 
as 
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where fe is degree of i. Then the averaged homophily dis- 
tance for the nodes with degree k can be defined as 
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Figure 4: Homophily property. 

As shown in Figure [H there is an explosive increment of 
H(k) beyond the same degree value kr- As H(k) measures 



the similarity between users' attributes, it convinces our in- 
ference that "rational users" have an tendency to establish 
friendships with their familiar friends in the real world, usu- 
ally sharing some common demographics such as dorm, ma- 
jor etc. While "aggressive users" are somewhat aimless to 
add as many friends as possible, so the demographics vary 
a lot. 

Remark We confirm that our findings from the dataset of 
NewOrleans are pervasively existing in online social networks 
through the above observations. We even strengthen our 
conjectures that many connections of "aggressive users" are 
established with no substantial meaning by observing the 
homophily property. That is to say, threshold phenomenon 
is not a special feature of the sampled dataset, but a general 
character of online social networks. 

All the previous analysis, based on the six datasets, mainly 
relies on the topological feature of the networks. However, 
pure structural information cannot be so convinced to repre- 
sent the interaction between users. So it is necessary to val- 
idate our former conjectures through some real-world traces 
of interactions, which will be introduced in the next subsec- 
tion. 

4.3 Validation 

Generally, the interaction data flowing in the online so- 
cial sites is hard to collect. Because there are always some 
configurations of privacy protection in these sites. To our 
best knowledge, we collect two datasets from [23] and [18] . 
respectively. Both of the datasets are publicly available and 
anonymized for research purpose. Besides, the two datasets 
both come from Facebook. 

The first dataset was collected from the Facebook net- 
work in the city of New Orleans, which was related to the 
dataset of NewOrleans mentioned in the previous sections. 
They collected the publicly accessible profile pages and ab- 



stracted the list of the "Wall" post. So we denote this 
dataset as NewOrleans-Wall. " Wall" is a popular feature 
of Facebook, through which a user can leave messages on 
his friends' profile pages and the friends can also reply him 
by leaving messages, too. It is a classical and easy interact- 
ing way in Facebook. The dataset covers as many as 46952 
users. 

The second dataset was collected through some Facebook 
applications developed by the authors of [18], denoted as 
Facebook-Applications. They developed three popular Face- 
book applications named as GotLove?, HUG and Fight- 
ers' Club, respectively. The authors collected about 3-week 
traces starting from March 20, 2008. Here we only use the 
data from GotLove? and HUG. In GotLove?, one node can 
send 'love' to its friends. And in HUG, a user A can send 
a virtual 'hug' to a friend B. The dataset from GotLove? 
contains 642088 active users and the one from HUG contains 
198379 active users. 

Based on these traces of interaction in Facebook, we try 
to relate the degree of nodes with their activity strength 
and then to validate our previous findings. In the dataset of 
NewOrleans-Wall, we define the node i's activity strength as 
the length of its list of the wall post, which can be denoted 
as Li. The longer the list of the wall post is, the more 
interactions between i and its neighbors happen. Then the 
averaged activity strength of the nodes with degree k can be 
defined as 

U 



L(k) = 



^2{i€V\k i= k} - 

\{i e v\h = k}\ 



(14) 



For the dataset of Facebook-Applications, we simply ex- 
tract the number of 'love' or 'hug' the node i has sent (si) 
and received (74). Then the reciprocation of a node i, de- 
noted as n, can be defined as the ratio of 7i and Si, i.e., 
Ti = 7i/sj. Then the averaged reciprocation of the nodes 
with degree k can be defined as 



r{k) 



^2{iSV\ki = k} Ti 

\{i i V\h = k}\ 



(15) 





Figure 6: Reciprocation varies with k. 



Observation 2. It is obviously shown in Figure [6] that 
the reciprocation of the nodes approaches 1 when k < kr- 
It means almost all the lower-degree users' sent 'love' and 
'hug' are reciprocated. However r(k) diverges when k > kr, 
indicating that their interaction with friends is not symmet- 
ric. As has been illustrated in Section [4.11 users accumulate 
many friends either because of popularity or eagerness for 
prestige, thus we can infer that some are far below 1 since 
these users' behaviors are ignored by their friends, while 
some are above 1 because they are popular enough to re- 
ceive 'love' and 'hug' from many fans. 

Remark The above experiments further validate our con- 
jecture that there exists a degree threshold in online social 
networks. If a user's degree is higher than the threshold, the 
user then cannot maintain all its online friendships well and 
part of the friendships can be easily ignored by the friends 
on the another end. 

Summary Until now, by validating from the real-world 
traces of user interactions online, we can reasonably con- 
clude that there still exists an upper limit on the number of 
the friendships in online social networks as Dunbar's num- 
ber in offline social networks. If users have more friends than 
the limit, it is impossible for them to treat each tie equally. 
Because of this, extraordinary dynamics will be bred when 
the degree goes up to the limit. As a result, we see the phe- 
nomenon that high-degree users keep overall relationships 
of low quality. In a further step, we believe that users with 
mediate degrees are "rational users" with the motivation to 
maintain old friendships while users who have friends more 
than the limit are likely to be " aggressive users" seeking for 
new friends always. 

However, little attention has been paid to these phenom- 
ena in many current models, and they could not interpret the 
generation rule completely. So we aim to understand how it 
generates by presenting a new model in the next section. 



Figure 5: NewOrleans-Wall 

Observation 1. Just as shown in Figure[5l as k increases, 
L(k) increases quickly. However, when k reaches out of the 
range [200,300], L(k) stops increasing and begin to fluctuate. 
This is consistent with our former finding from the pure 
topological data. The fluctuation of L(k) implies that some 
of the users with degrees higher than the threshold kx have 
shorter list of the wall post and their interaction remains in 
a rather low level. 



5. A NEW MODEL 

In this section, we present a new model to interpret the 
generation of the upper limit found in the previous sections. 
We start from summarizing users' online behaviors from pre- 
vious observations and conclusions. Next we introduce an 
inspiring model, and point out the imperfections to apply 
the model to the situation of online social networks. Then 
we incorporate the characters of users' online behaviors to 
propose the new model. At last we examine the properties 
of our simulated network to compare with the real networks. 

As has been discussed in Section 0] users' online friends 



adding follows these rules: 

1. When users first register in the sites, they tend to 
search for their offline acquaintances, and the network 
system would also recommend some friends based on 
the user's profiles. These friends become their initial 
online contacts, and they provide the basis for further 
friend making. 

2. Even more conveniently than that in the real world, 
users can set up connections with their friends' friends 
simply by viewing the friends list and choosing who 
they already know to add as friends. Besides, some 
sites also recommend friends' friends to help users find 
more friends online. 

3. As is illustrated in previous sections, most users are 
"rational users" to be trapped in a magic number circle. 
After the number of friends is accumulated to a certain 
upper limit, they would stop adding more friends; or 
what's worse, they may even reject others' invitation to 
be added as friends. While only a few of sociable users 
jump out of the circle to become "aggressive users", 
they'd like to add as many friends as possible actively. 
In fact, this process results in random linkage since the 
"aggressive users" do not have explicit target users to 
link to. 

4. Though online friendship maintenance costs almost no 
money or time, "unfriend" situation still exists. For ex- 
ample, Section 14.11 implies that the ties between "ag- 
gressive users" and their friends are fragile and can 
vanish in some way. Moreover, social sites like Face- 
book will "pull" all friends' updates to the users' news 
feed; however, some users may be annoyed by contin- 
ually receiving one specific friend's message, thus "un- 
friend" happens under this circumstance and the links 
are removed. 

In order to model the growth of social networks, Jin et 
al. Q3] proposed two models based on three general prin- 
ciples. First, the individuals tend to meet with those ones 
who have one or more commons friends with them. Second, 
acquaintances between the individuals who rarely meet de- 
cay over time, which means some ties may vanish. Third, 
there is a maximum degree limit for an individual. However, 
features of the online social network are different from their 
assumptions. For instance, the online social networks usu- 
ally starts to evolve from a real-world social network. The 
site will urge the user to invite their real-world friends to 
the site or provide easy way to search them in the site. The 
another difference is the limit of the degree. In their models, 
a node can not have a higher degree than the maximum de- 
gree. But in online social networks, the maximum value set 
by the site may be high, like 5000 in Facebook or even more. 
Because of this, their model can only control the maximum 
degree of the nodes, but can not interpret the threshold de- 
gree as we find. In addition, this is also worth to be noticed 
that in their model, the constraints of time and cognition 
are only associated with the control of maximum degree. 
Given these imperfections, we try to model the online social 
network based on the following assumptions: 

• The network start to evolve from a existing social net- 
work. Here, we simply start it from a BA network. 
The network evolves only by adding or removing ties. 



• New ties between nodes with common friends are pre- 
ferred. However, for the nodes with degrees higher 
than a threshold, the probability of tie be established 
between its neighbors is lower. 

• Some nodes may search and add a random node as a 
friend. 

• Some ties may vanish, especially for the nodes with 
high degrees. 

Guided by these principles, we present a simple model, called 
BA — shift as follows: 

• Step 1: load a BA network, denoted as BA(V, Eo), 
where V is the set of nodes and Eo is the set of original 
ties. 

• Step 2: In each time unit, perform the following ac- 
tions: 

— Action 1: Select a node i with the probability 
ki(ki - l)f(h) 



p(i) = 



(16) 



E J€ v fc i(*a-i)/(*j)' 

where ki is the current degree of i. Here we use 
f(ki) to constrain the nodes with higher degrees 
than the threshold fc-r. We define 



f(ki) = 



;/ 9(fci-fc T ) + I ' 



(17) 



where is a parameter to control the extent of the 
constraint. If ki > 2, randomly select two of its 
neighbors and establish a tie between them if they 
are not connected in the earlier stage. Repeat this 
action for 



ic^fc»(fc l - 1) 



(18) 



times, where c is the speed of adding new ties 
Action 2: Select a node q with probability 

k q + 1 



(19) 



If k q > 1, select one of its neighbors randomly 
and remove the tie between them. Repeat this 
action for 



(20) 



times, where d is the speed of removing ties. 
— Action 3: Randomly select a pair of nodes and 
add a tie between them if they are not connected 
originally. Repeat this action for \V\r times, where 
r is the speed of adding linkage randomly. 

• Step 3: If the current averaged degree of the network 
reaches (k) max , stop evolving and return the network. 
Otherwise, increase the evolving time and then jump 
to Step 2. 

Remark 1. In the first step we choose to load a BA net- 
work just because it is classical and simple. Many real- world 
networks are found scale-free, including the social networks. 

Remark 2. In the Step 2 Action 1, the nodes with higher 
degrees will be selected to increase the closeness of the net- 
work among their friends. However, when the node's degree 



is higher than the threshold kr, the probability of getting 
selected will decrease sharply. It responds to the situation 
that the nodes whose degrees have exceeded the threshold, 
some of theirs online friendships would be weak and it is 
hard for their neighbors to get acquainted through them. 
This is the essential part of the model, which is different 
from the others. 

Remark 3. In the Step 2 Action 2, the nodes with higher 
degrees will be selected easily to loose a random acquain- 
tance. Because for the high-degree nodes, it is easy to ignore 
some friendship for the constraints of time and cognition. 

Remark 4. In the Step 2 Action 3, a pair of nodes will be 
randomly selected and connected. It responds to the phe- 
nomenon that some friendships are established casually in 
online social sites. For instance, some users may search other 
strange ones with common interests or just accept some un- 
known invitations. 

In the following simulations, we denote the network gener- 
ated by the model as BA — shift(\V\, (k) max ,c, d, r, /?, kr)- 
The BA network we use contains 20000 nodes and 39973 ties 
originally. As showed in Figure [JJ it is easy to find for all 
the local measures, including C(k), k nn (k) and w(k), there 
exists an turning point near kr = 200. This result is con- 
sistent with the real-world datasets. However, for the BA 
network, the variations of C(k) and k nn (k) keep decreasing 
steadily with k, while w(k) just increases without any de- 
scending tendency. In fact, compared with other models, 
such as JGN and BA, BA — shift pays more attention to 
understanding how the constraints of time and cognition af- 
fect the evolution of online social networks. The aim of the 
model is to unveil the generation of the threshold we find. 
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Figure 7: B A- shif t(20000, 20, 0.0005, 0.0005, 0.0001, 8, 200) 

Summary In this section ,we present a simple model to 
interpret the generation of the upper limit. Compared to 
Dunbar's number, the value of the limit in online social net- 
works is greater. We believe it will bring some impact and 
insight to the current situation. 

6. BUSINESS INSIGHTS 
6.1 Online viral marketing 

Thanks to the thorough growth of online social networks 
in the recent decade, a new strategy for marketing has been 
deployed. Nowadays we may often see some comments on a 
particular product from our Facebook friends' wall, or adver- 
tisements may appear in the form of tweets from the people 
we follow on Twitter. That's indeed an instance of the online 



viral marketing, as product information spreads from per- 
son to person directly(word-of-mouth) within the networks 
and influences people's purchasing decisions. Viral market- 
ing in online social networks may be quite effective as people 
may seriously consider friends' recommendations. However, 
questions still remain on how to do it and where to start. 
Leskovec et al. [TS] suggest that unlike epidemic spread- 
ing models, high-degree nodes are not so influential in viral 
marketing situation. This conclusion can be well supported 
by our observations, since "aggressive users" with thousands 
of friends only interact with a small group of friends. We 
validate this in a further step by importing the measure- 
ment of k a (k). As shown in Figure [51 the averaged k-shell 
value stops increasing and remains stable with slight fluc- 
tuations after the threshold, meaning that the core effect 
is not obvious for high-degree nodes. Just as illustrated in 
[14] . some high-degree users are wrapped by large amount 
of low-degree users in the periphery, so that themselves are 
also positioned in the periphery and play a trivial role in 
spreading product information. 



NewOrieans 

Georgetown 




Figure 8: Variations of k 3 (k) with k. 

From what we have discussed in previous sections, we be- 
lieve that due to the loosely connected neighborhood and 
lacking of interactions, the messages sent out by the "ag- 
gressive users" may be ignored easily. In contrast, "rational 
users" are tightly linked to each other, and these users' online 
friends are largely covered by their offline friends. There- 
fore they can be more trustful, and their messages would 
be thought highly of by their friends. In consequence, their 
friends may be induced by their purchasing suggestions and 
behaviors. 

6.2 Privacy management in online social net- 
works 

In order to guarantee a more authentic social network, 
most online social sites require users to provide their authen- 
tic personal information when firstly register in the sites. 
However, this can arouse the concerns for users' privacy is- 
sues as users' profile information such as ZIP code, gender 
and birthday may be stolen for improper use [9]. 

At the same time, users have begun to recognize the ne- 
cessities of privacy protection in online social networks, espe- 
cially the "rational users". Since these users have "moved" quite 
a number of offline friends to the social sites, they regard the 
social sites as personal space to interact with their friends, 
so that the interaction can be quite private, even secret. 
In view of this, current social sites such as Facebook has 
already started to provide the service of privacy settings. 



Users themselves can determine whether to reveal their in- 
formation only to friends or to the public. In fact, friend- 
ships are regarded as binary in this situation, that is to say, 
all the private settings are equally effective to each friend- 
ship. However, as shown in our discussions, users cannot 
treat each tie equally, "rational users" indeed have different 
attitudes toward their online friends, so they may desire for 
a more detailed and flexible mechanism that enables them to 
have different privacy settings for different groups of friends. 

However, things are different for "aggressive users". They 
do not care too much about privacy, and instead they are 
willing to disclose their information to more users in order 
to gain popularity. Another particular phenomenon should 
be noted is that there exist "spammers" in online social net- 
works. "Spammers" disguise as "aggressive users", usually 
with fake profiles of celebrities, to establish so many con- 
nections with the intention to carry out identity theft |10j . 
which is a great threat to online users. To detect such "spam- 
mers", we can examine its interaction records because they 
only add friends but do not interact at all. 

7. CONCLUSION 

Just as unveiled in social networks, there is still a magic 
upper limit on users' number of friendships that they can ef- 
fectively maintain in online social networks. Through abun- 
dant experiments and validations, we conclude that users 
with considerable circles of friends within the magic number 
are "rational users". They mainly use online social networks 
on the purpose of maintaining their old friendships. In con- 
trast, "aggressive users" reach out of the magic number with 
the desire to make as many new friends as possible. We 
also propose a new online social network model to further 
explain users' online behaviors. We think the findings of 
the new magic number and distinction of users are helpful 
in viral marketing and privacy management issues in online 
social networks. 
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